NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

APOLLO: SGD-like Memory, AdamW-level Performance

Zhu, H; Zhang, Z; Cong, W; Liu, X; Park, S; Chandra, V; Long, B; Pan, D Z; Wang, Z; Lee, J (February 2025, https://doi.org/10.48550/arXiv.2412.05270)

Large language models (LLMs) are notoriously memory-intensive during training, particularly with the popular AdamW optimizer. This memory burden necessitates using more or higher-end GPUs or reducing batch sizes, limiting training scalability and throughput. To address this, various memory-efficient optimizers have been proposed to reduce optimizer memory usage. However, they face critical challenges: (i) reliance on costly SVD operations; (ii) significant performance trade-offs compared to AdamW; and (iii) still substantial optimizer memory overhead to maintain competitive performance. In this work, we identify that AdamW's learning rate adaptation rule can be effectively coarsened as a structured learning rate update. Based on this insight, we propose Approximated Gradient Scaling for Memory-Efficient LLM Optimization (APOLLO), which approximates learning rate scaling using an auxiliary low-rank optimizer state based on pure random projection. This structured learning rate update rule makes APOLLO highly tolerant to further memory reductions while delivering comparable pre-training performance. Even its rank-1 variant, APOLLO-Mini, achieves superior pre-training performance compared to AdamW with SGD-level memory costs. Extensive experiments demonstrate that the APOLLO series performs on-par with or better than AdamW, while achieving greater memory savings by nearly eliminating the optimization states of AdamW. These savings provide significant system-level benefits: (1) Enhanced Throughput: 3x throughput on an 8xA100-80GB setup compared to AdamW by supporting 4x larger batch sizes. (2) Improved Model Scalability: Pre-training LLaMA-13B with naive DDP on A100-80GB GPUs without system-level optimizations. (3) Low-End GPU Friendly Pre-training: Pre-training LLaMA-7B on a single GPU using less than 12 GB of memory with weight quantization.
more » « less
Free, publicly-accessible full text available February 17, 2026
Creating and Assessing an Upper Division Additive Manufacturing Course and Laboratory to Enhance Undergraduate Research and Innovation

Maloney, P; Cong, W; Zhang, M; Li, B. (June 2019, Proceedings of the American Society of Engineering Education)

Additive manufacturing (AM) is prevalent in academic, industrial, and layperson use for the design and creation of objects via joining materials together in a layer upon layer fashion. However, few universities have an undergraduate course dedicated to it. Thus, using NSF IUSE support [grant number redacted for review] from the Exploration and Design Tier of the Engaged Student Learning Track, this project has created and implemented such a course at three large universities: Texas Tech (a Carnegie high research productivity and Hispanic Serving Institution), Kansas State (a Carnegie high research productivity and land grant university) and California State, Northridge (the largest of all the California State campuses and highly ranked in serving underprivileged students). Our research team includes engineering professors and a sociologist trained in assessment and K-12 outreach to determine the effects of the course on the undergraduate and high school students. We are currently in year two of the three years of NSF support.
more » « less
Full Text Available
Creating and Assessing an Upper Division Additive Manufacturing Course and Laboratory to Enhance Undergraduate Research and Innovation

Maloney, P; Li, B; Zhang, M; Cong, W. (June 2018, ASEE annual conference & exposition)

This NSF IUSE project is on the Exploration and Design Tier and the Engaged Student Learning Track. It is aimed at better preparing the country’s professional workforce in the renaissance of U.S. skilled manufacturing by creating new personnel proficient in additive manufacturing (AM). AM is mainstream; it has the potential to bring jobs back to the U.S. and add to the nation’s global competitiveness. AM is the process of joining materials to make objects from 3D data in a layer upon layer fashion. The objectives are to develop, assess, revise, and disseminate an upper division course and laboratory, “Additive Manufacturing,” and to advance undergraduate and K-12 student research and creative inquiry activities as well as faculty expertise at three diverse participating universities: Texas Tech, California State Northridge, and Kansas State. This research/pedagogical team contains a mechanical engineering professor at each university to develop and teach the course, as well as a sociologist trained in K-12 outreach, course assessment, and human subjects research to accurately determine the effects on K-12 and undergraduate students. The proposed course will cover extrusion-based, liquid-based, and powder-based AM processes. For each technology, fundamentals, applications, and advances will be discussed. Students will learn solutions to AM of polymers, metals, and ceramics. Two lab projects will be built to provide hands-on experiences on a variety of state-of-the-art 3D printers. To stimulate innovation, students will design, fabricate, and measure test parts, and will perform experiments to explore process limits and tackle real world problems. They will also engage K-12 students through video demonstrations and mentorship, thus developing presentation skills. Through the project, different pedagogical techniques and assessment tools will be utilized to assess and improve engineering education at both the undergraduate and K-12 levels through varied techniques: i) undergraduate module lesson plans that are scalable to K-12 levels, ii) short informational video lessons created by undergraduates for K-12 students with accompanying in-person mentorship activities at local high schools and MakerSpaces, iii) pre- and post-test assessments of undergraduates’ and K-12 participating students’ AM knowledge, skills, and perceptions of self-efficacy, and iv) focus groups to learn about student concerns/learning challenges. We will also track students institutionally and into their early careers to learn about their use of AM technology professionally.
more » « less
Full Text Available

Search for: All records